February 17, 2023

Maven Slopes Challenge


Full challenge details - Introducing the Maven Slopes Challenge.
Link to GitHub repository, including data and final output. Download a copy at the bottom right corner above.
1. Project Title and Description
A tool to help skiers find their ideal destination. It will look at the inputs of the user and recommend resort(s) that match the requirements.

2. Problem
There are many ski resorts around the world with different features and seasons. Casual and new skiers may find it hard to pick a destination that suits their needs. Hence, I build an interactive tool to address this problem.

3. Data
This dataset contains 2 tables in CSV format:
The Resorts table contains information on 499 ski resorts around the world, including their location, slopes, lifts, prices, and ski season.
The Snow table contains supplemental data on the surface of the earth covered by snow for each month in 2022, by latitude & longitude.

Data cleaning:
1. There are random question marks in the resorts’ names => remove and replace with nothing.
BEFORE
AFTER
2. The season of each resort varies => to make comparison easier, the season periods are replaced with actual months. The month column is then split into individual month columns and unpivoted so that the months become values for filtering. Some resorts have an unknown season. To give them the benefit of the doubt, all twelve months are allocated to them. The final ranking relies on snow coverage data hence the months with high coverage should be within the resorts’ season.
REPLACE CHARACTERS WITH NUMBERS
SPLIT COLUMN BY DELIMITER
UNPIVOT COLUMNS
3. The pass price of each resort is grouped into buckets for ease of comparison.
GROUP PRICES INTO RANGES
4. If any resort has 0 km in Beginner/ Intermediate/ Difficult slopes, it is assumed that the resort does not feature those slopes.
ADD YES/NO COLUMNS
5. The regions in the Snow table have an area of 0.25x0.25 degrees of latitude and longitude => the latitudes and longitudes are combined to form unique coordinates. In the Resort table, latitudes and longitudes are rounded to the nearest region and combined. Finally, the snow data is brought to the Resorts table using the Merge query function.
ROUNDING LATITUDE AND LONGITUDE
ROUNDED LATITUDE AND LONGITUDE
COMBINE INTO AREA COORDINATE
MERGING RESORTS AND SNOW TABLES USING LEFT OUTER JOIN
4. Result
A pivot table that shows the list of resorts, the snow coverage, the number of snow cannons and the total lift capacity per hour. Snow coverage is used to rank the resorts, from highest coverage to lowest coverage. Snow cannons act as a backup plan in case of low snowfall. The higher the lift capacity, the faster speed of moving to the starting points, and the more skiing time.

The columns Month, Continent, Country, Price, Beginner slopes, Intermediate slopes, Difficult slopes, Summer skiing, Night skiing, Snowpark, and Child friendly are used as slicers for users to narrow down the resort list.

5. Technical Details
The project used entirely Excel to encourage beginners in data analysis to participate in the challenge. The data cleaning process was done in the Power Query editor of Excel. It is possible to clean the data using entirely Excel formulas instead of Power Query but the process would be more tedious.

6. Challenges
In the beginning, I made the mistake of creating several filter tables for the months, area, and snow coverage. It was not possible because the data model became too complex and the relationships did not filter through correctly. I realised that it was much easier to bring all the data I needed into the Resorts tables.

The area coordinates follow a specific format with an increment of 0.25 starting from -0.875. It took some time to write the correct M code to round the coordinates in the Resorts table to the nearest region.

The season column in the Resorts table took a while to be transformed into months. I could not write an M code to automatically transform the values. Hence, I used the manual approach to match the seasons with the associated months with the ‘Column from example’ function.